Penalized Linear Unbiased Selection
نویسنده
چکیده
We introduce MC+, a fast, continuous, nearly unbiased, and accurate method of penalized variable selection in high-dimensional linear regression. The LASSO is fast and continuous, but biased. The bias of the LASSO interferes with variable selection. Subset selection is unbiased but computationally costly. The MC+ has two elements: a minimax concave penalty (MCP) and a penalized linear unbiased selection (PLUS) algorithm. The MCP provides the minimum non-convexity of the penalized loss given the level of bias. The PLUS computes multiple local minimizers of a possibly non-convex penalized loss function in certain main branch of the graph of such solutions. Its output is a continuous piecewise linear path encompassing from the origin to an optimal solution for zero penalty. We prove that for a universal penalty level, the MC+ has high probability of correct selection under much weaker conditions compared with existing results for the LASSO for large n and p, including the case of p ≫ n. We provide estimates of the noise level for proper choice of the penalty level. We choose the sparsest solution within the PLUS path for a given penalty level. We derive degrees of freedom and Cp-type risk estimates for general penalized LSE, including the LASSO estimator, and prove their unbiasedness. We provide necessary and sufficient conditions for the continuity of the penalized LSE under general sub-square penalties. Simulation results overwhelmingly support our claim of superior variable selection properties and demonstrate the computational efficiency of the proposed method.
منابع مشابه
Penalized Bregman Divergence Estimation via Coordinate Descent
Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...
متن کاملComparing Different Marker Densities and Various Reference Populations Using Pedigree-Marker Best Linear Unbiased Prediction (BLUP) Model
In order to have successful application of genomic selection, reference population and marker density should be chosen properly. This study purpose was to investigate the accuracy of genomic estimated breeding values in terms of low (5K), intermediate (50K) and high (777K) densities in the simulated populations, when different scenarios were applied about the reference populations selecting. Af...
متن کاملA Framework for Unbiased Model Selection Based on Boosting
Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection m...
متن کاملVariable Selection via Penalized Likelihood
Variable selection is vital to statistical data analyses. Many of procedures in use are ad hoc stepwise selection procedures, which are computationally expensive and ignore stochastic errors in the variable selection process of previous steps. An automatic and simultaneous variable selection procedure can be obtained by using a penalized likelihood method. In traditional linear models, the best...
متن کاملEstimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007